spam email
Adversarial training with restricted data manipulation
Benfield, David, Coniglio, Stefano, Vuong, Phan Tu, Zemkoho, Alain
Adversarial machine learning considers the exploitable vulnerabilities of machine learning models and the strategies needed to counter or mitigate such threats [32]. By considering these vulnerabilities during the development stage of our machine learning models, we can work to build resilient methods [9, 11] such as protection from credit card fraud [35] or finding the optimal placement of air defence systems [20]. In particular, we consider the model's sensitivity to changes in the distribution of the data. The way the adversary influences the distribution can fall under numerous categories, see [21] for a helpful taxonomy that categorises these attacks. We focus on the specific case of exploratory attacks, which consider the scenarios where adversaries attempt to modify their data to evade detection by a classifier. Such attacks might occur in security scenarios such as malware detection [3] and network intrusion traffic [31]. In a similar vein, and more recently, vulnerabilities in deep neural networks (DNN) are being discovered, particularly in the field of computer vision and image classification; small perturbations in the data can lead to incorrect classifications by the DNN [33, 19]. These vulnerabilities raise concerns about the robustness of the machine learning technology that is being adopted and, in some cases, in how safe relying on their predictions could be in high-risk scenarios such as autonomous driving [15] and medical diagnosis [16]. By modelling the adversary's behaviour and anticipating these attacks, we can train classifiers that are resilient to such changes in the distribution before they occur.
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Europe > Italy (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
A Comprehensive Analysis of Adversarial Attacks against Spam Filters
Hotoğlu, Esra, Sen, Sevil, Can, Burcu
Deep learning has revolutionized email filtering, which is critical to protect users from cyber threats such as spam, malware, and phishing. However, the increasing sophistication of adversarial attacks poses a significant challenge to the effectiveness of these filters. This study investigates the impact of adversarial attacks on deep learning-based spam detection systems using real-world datasets. Six prominent deep learning models are evaluated on these datasets, analyzing attacks at the word, character sentence, and AIgenerated paragraph-levels. Novel scoring functions, including spam weights and attention weights, are introduced to improve attack effectiveness. This comprehensive analysis sheds light on the vulnerabilities of spam filters and contributes to efforts to improve their security against evolving adversarial threats. Introduction Deep learning has seen significant advancements in the field of natural language processing (NLP), particularly in tasks such as ...
- North America > United States > Texas > Travis County > Austin (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach
Janez-Martino, F., Alaiz-Rodriguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E.
Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, N\"aive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in and on average, respectively.
- Europe > Spain > Castile and León > León Province > León (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
- (2 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.35)
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach
Jamal, Suhaima, Wimmer, Hayden
Phishing and spam detection is long standing challenge that has been the subject of much academic research. Large Language Models (LLM) have vast potential to transform society and provide new and innovative approaches to solve well-established challenges. Phishing and spam have caused financial hardships and lost time and resources to email users all over the world and frequently serve as an entry point for ransomware threat actors. While detection approaches exist, especially heuristic-based approaches, LLMs offer the potential to venture into a new unexplored area for understanding and solving this challenge. LLMs have rapidly altered the landscape from business, consumers, and throughout academia and demonstrate transformational potential for the potential of society. Based on this, applying these new and innovative approaches to email detection is a rational next step in academic research. In this work, we present IPSDM, our model based on fine-tuning the BERT family of models to specifically detect phishing and spam email. We demonstrate our fine-tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets. This work serves as an important first step towards employing LLMs to improve the security of our information systems.
- North America > United States > Georgia > Bulloch County > Statesboro (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- Europe > Switzerland (0.04)
- Asia > China (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Application of BadNets in Spam Filters
Roychoudhury, Swagnik, Veldanda, Akshaj Kumar
Spam filters are a crucial component of modern email systems, as they help to protect users from unwanted and potentially harmful emails. However, the effectiveness of these filters is dependent on the quality of the machine learning models that power them. In this paper, we design backdoor attacks in the domain of spam filtering. By demonstrating the potential vulnerabilities in the machine learning model supply chain, we highlight the need for careful consideration and evaluation of the models used in spam filters. Our results show that the backdoor attacks can be effectively used to identify vulnerabilities in spam filters and suggest the need for ongoing monitoring and improvement in this area.
- North America > United States > New York > New York County > New York City (0.15)
- North America > United States > New Jersey (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Asia > Middle East > Jordan (0.04)
Building an Effective Email Spam Classification Model with spaCy
Today, people use email services such as Gmail, Outlook, AOL Mail, etc. to communicate with each other as quickly as possible to send information and official letters. Spam or junk mail is a major challenge to this type of communication, usually sent by botnets with the aim of advertising, harming and stealing information in bulk to different people. Receiving unwanted spam emails on a daily basis fills up the inbox folder. Therefore, spam detection is a fundamental challenge, so far many works have been done to detect spam using clustering and text categorisation methods. In this article, the author has used the spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in the Python programming language to detect spam emails collected from the Gmail service. Observations show the accuracy rate (96%) of the Multilayer Perceptron (MLP) algorithm in spam detection.
The 2023 Machine Learning Engineer RoadMap
Learning this fabulous programming language is not just mandatory to start your journey in machine learning. Still, it is an investment in yourself that you may need all your life because you can even shift your career to another one and still use python in that new industry. This is almost the most popular course among python developers which will help you learn the basics of this language and use the Python built-in data structure, accessing the web, which will be very useful when you are trying to get the data from the web, and using python with the database. The course has more than a million students with a 4.8 rating score which is an excellent resource. Alternatively, you can start your Machine Learning Career with R programming language.
- Education > Educational Setting > Online (0.51)
- Education > Educational Technology > Educational Software > Computer Based Training (0.31)
Spam Detection Using BERT
Sahmoud, Thaer, Mikki, Mohammad
Abstract-Emails and SMSs are the most popular tools in today communications, and as the increase of emails and SMSs users are increase, the number of spams is also increases. Spam is any kind of unwanted, unsolicited digital communication that gets sent out in bulk, spam emails and SMSs are causing major resource wastage by unnecessarily flooding the network links. Although most spam mail originate with advertisers looking to push their products, some are much more malicious in their intent like phishing emails that aims to trick victims into giving up sensitive information like website logins or credit card information this type of cybercrime is known as phishing. To countermeasure spams, many researches and efforts are done to build spam detectors that are able to filter out messages and emails as spam or ham. In this research we build a spam detector using BERT pre-trained model that classifies emails and messages by understanding to their context, and we trained our spam detector model using multiple corpuses like SMS collection corpus, Enron corpus, SpamAssassin corpus, Ling-Spam corpus and SMS spam collection corpus, our spam detector performance was 98.62%, 97.83%, 99.13% and 99.28% respectively.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.04)
- Information Technology > Security & Privacy > Spam Filtering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Can artificial intelligence spot spam quicker than humans?
More than 40 years ago in 1978, a computer vendor in the USA sent the first spam email, but only 20 years later, in the early 2000s, it looked as if spam would finally kill off email altogether. The huge quantities of junk email being generated threatened to overwhelm the world's inboxes and stifle productivity completely. It was just a stroke of good luck that artificial intelligence (AI) in the shape of machine learning (ML) emerged at around the same time to help combat the onslaught by sifting through massive amounts of data and using it to learn how to recognise different patterns that were a common feature of mass mailings. AI is sometimes used as a catch all term, when in practice most companies are using machine learning which can't extrapolate new conclusions without new training data. Today, machine learning artificial/intelligence can spot spam, but because of the limits of machine learning, humans need to step in from time to time.
- Leisure & Entertainment > Games > Chess (1.00)
- Information Technology > Security & Privacy (1.00)
Machine Learning Series
FROM the past few years I have seen huge evolution in the Machine learning industry. The motive of this series is to know actually what is Machine Learning and algorithms used to build the model. Lets start Machine Learning by knowing its history like how Machine Leaning evolved and when ML become famous. ML evolved a long time ago around 1940's. Let's relate this to an Indian actor "Pankaj Tripathi".